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Leaders  in  any  complex  organization  like  the  Army  are  constantly  required  to 
make  decisions  intended  to  improve  organizational  performance.  Effective 
analysis  and  decision  making  by  leaders  require  an  understanding  of  orga¬ 
nizational  functioning  and  the  dynamics  of  organizational  change  in  theory 
and  practice.  Research  can  be  designed  to  assist  leaders  in  better  under¬ 
standing  how  their  organization  functions  and  how  they  may  be  improved. 

However,  for  such  research  to  provide  sound  guidance  to  leaders,  the  methods 
that  are  employed  must  be  capable  of  handling  the  complexities  of  dynamic 
individual  and  group  interaction.  Unfortunately,  many  of  the  methods  currently 
employed  by  social  scientists  are  best  suited  to  handling  less  complex  forms 
of  data. 


The  purpose  of  this  report  is  to  provide  researchers  with  statistical  tools 
that  will  assist  them  in  analyzing  complex  forms  of  data.  The  focus  of  this 
report  is  on  techniques  for  estimating  measurement  error,  using  scores  that 
are  aggregated  by  group.  These  scores  are  useful  for  evaluating  group  dynamics 
in  organizations  as  complex  as  the  Army. 


JOSEPH  ZEHdVer 
Technical  director 
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RELIABILITY  ESTIMATION  FOR  AGGREGATED  DATA:  APPLICATIONS  FOR  ORGANIZATIONAL 
RESEARCH 


BRIEF 


Requirement: 

In  order  to  study  organizations  it  is  important  to  be  able  to  measure 
organizational  functioning  with  a  minimum  of  error.  The  report  that  follows 
provides  the  statistical  tools  necessary  to  measure  the  extent  of  error  that 
exists  in  survey  data,  and  organizational  record  data.  Traditional  methods  of 
measuring  error  are  either  inappropriate  or  incomplete  when  applied  to  organiza¬ 
tional  groups,  necessitating  the  statistical  development  given  here.  Appropri¬ 
ate  methods  of  measuring  error  are  particularly  important  when  organizational 
change  is  being  studied.  In  this  case,  the  same  variables  are  measured  at  more 
than  one  point  in  time.  The  investigator  wants  to  identify  real  organizational 
change.  However,  real  change  cannot  be  separated  from  changes  in  measurement 
error,  unless  separate  estimates  of  measurement  error  are  available  at  each  point 
in  time.  This  paper  tells  how  to  get  separate  error  estimates  so  that  real 
organizational  change  can  be  studied. 


Procedure: 

When  research  is  conducted  in  an  organizational  setting,  group  units  of 
analysis  are  often  required.  When  group  units  of  analysis  are  used,  the  values 
of  the  variables  generally  consist  of  mean  scores  that  have  been  aggregated 
across  both  survey  items  and  respondents  within  groups.  Analysis  of  variance  was 
used  here  to  derive  the  appropriate  reliability  formulas  for  these  aggregated 
scores.  From  the  definition  of  reliability,  which  involves  the  ratio  of  true  to 
total  variance,  formulas  are  derived  by  finding  the  mean  square  components  that 
are  equivalent  to  the  reliability  definition.  This  requires  use  of  expected  mean 
squares  for  the  unit  of  analysis  term  and  other  "error"  terms.  Since  the 
aggregated  scores  typically  contain  repeated  observations  across  items  as  well  as 
survey  respondents,  with  respondents  nested  within  groups,  a  split-plot 
(repeated-measures)  design  can  usually  describe  the  structure  of  the  data,  with  a 
hierarchical  structure  added  also  as  needed.  This  split-plot  design  contains  two 
"error"  terms — a  split-plot  (within-subjects )  error  term  typically  associated 
with  inter-item  agreement,  and  a  whole  plot  (between-subjects )  error  term 
associated  with  consensus  between  respondents.  Both  types  of  error  can  enter 
into  the  reliability  formula  for  aggregated  scores,  depending  on  whether  survey 
items  and  respondents  are  considered  to  be  fixed  or  random,  which  in  turn  depends 
on  the  sampling  plan.  For  example,  respondents  may  be  fixed  (or  partially  fixed) 
if  the  populations  of  small  groups  are  exhaustively  sampled,  or  nearly  so.  When 
respondents  are  fixed,  the  appropriate  reliability  formula  is  not  the  same  as 
when  respondents  are  random. 


Findi ngs: 

Most  of  the  literature  on  organizations  using  group  units  of  analysis,  have 
estimated  reliability  either  Incorrectly  or  inconsistently. 

The  survey  construction  and  item  analysis  techniques  that  typically  maxi¬ 
mize  inter-item  agreement,  may  tend  to  reduce  consensus  between  respondents,  so 
that  surveys  like  the  Survey  of  Organizations,  that  were  initially  constructed  to 
maximize  inter-item  agreement,  may  have  poor  reliability  when  consensus  between 
respondents  is  desired. 

When  studying  groups  within  organizations,  what  level  of  the  hierarchy 
should  be  studied?  A  statistical  technique  for  estimating  the  level  of  the 
heirarchy  that  actually  controls  the  subject  matter  at  hand  is  provided.  This 
measure  can  be  used  as  a  guide  for  selecting  groups  at  appropriate  levels  of 
heirarchy  for  study. 


Utilization  of  Findings: 

These  statistical  techniques  provide  improved  procedures  for  studying  the 
operation  of  the  Army  and  other  organizations.  These  techniques  are  an 
essential  prerequisite  to  more  advanced  time-series  procedures  that  are  needed  to 
study  organizational  change.  If  an  investigator  wishes  to  examine  real  organiza¬ 
tional  change,  the  change  must  take  into  account  changes  in  measurement  error. 
Sometimes  change  appears  to  be  real  but  is  due  solely  to  changes  in  measurement 
error.  Change  in  measurement  error  instead  of  real  change  can  be  used  as  a 
plausible  alternative  explanation  for  almost  any  set  of  results  involving 
organizational  change.  If  separate  estimates  of  measurement  error  are  available 
at  each  point  in  time,  measurement  error  can  be  taken  into  account.  This  paper 
provides  the  tools  needed  to  get  appropriate  internal  consistency  estimates  of 
measurement  error,  and  to  show  how  these  estimates  change  with  time.  Once  these 
estimates  are  found,  real  organizational  change,  as  distinct  from  changes  in 
measurement  accuracy,  can  be  pinpointed. 
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RELIABILITY  ESTIMATION  FOR  AGGREGATED  DATA: 
APPLICATIONS  FOR  ORGANIZATIONAL  RESEARCH 


With  the  growth  of  organizational  development  over  the  last  twenty  years 
there  has  been  an  increase  in  field  research  on  the  functioning  of  intact 
organizations  (Porras,  1979).  Such  field  research  has  obvious  advantages  over 
laboratory  research  in  terms  of  the  possibilities  for  external  validity,  but  at 
the  same  time  researchers  working  with  intact  organizations  face  a  variety  of 
methodological  questions  that  have  not  been  satisfactorily  answered  to  date. 

One  very  basic  question  involves  the  selection  of  the  unit  of  analysis  for 
the  research  design.  Individuals  are  not  the  appropriate  unit  of  analysis  to 
test  most  hypotheses  about  group  functioning.  When  individuals  are  not  appropri¬ 
ate  units,  which  of  many  possible  groups,  at  what  level  of  the  organizational 
hierarchy  should  be  selected?  The  answer  will  be  suggested  by  the  hypotheses  and 
organizational  structure.  The  researcher  wishes  to  select  units  that  are 
responsible  for  and  have  control  over  the  dependent  variables.  While  organiza¬ 
tional  structure  and  the  hypotheses  may  suggest  which  groups  at  what  hierarchical 
level  control  particular  variables,  and  thus  provide  an  appropriate  unit  of 
analysis,  the  researcher  has  no  way  to  test  this  hypothesis  to  find  out  if  in  fact 
groups  at  one  level  of  the  hierarchy  provide  a  better  unit  of  analysis  than 
groups  at  another  level.  In  principle,  if  groups  at  one  level  of  the  hierarchy 
are  responsible  for  and  have  control  over  particular  d'pendent  variables,  then  we 
should  find  homogeneity  within  and  heterogeneity  between  the  independently 
operating  groups  on  the  dependent  measures  (see  Jones  &  Jones,  1975;  Bass, 
Valenzis,  Farrow,  &  Solomon,  1975).  This  phenomenon  will  be  called  the  principle 
of  synchronization,  and  will  be  used’ later  to  show  how  to  select  appropriate 
units  of  analysis. 

Evidence  that  researchers  in  the  field  are  having  trouble  selecting  units  of 
analysis  is  suggested  by  the  inconsistency  with  which  a  particular  unit  of 
analysis  is  used.  Once  a  given  unit  of  analysis  is  selected,  this  same  unit 
should  be  used  for  stating  hypotheses,  calculating  reliabilities  and  norms  (when 
survey  feedback  is  involved),  estimating  validity,  and  generalizing  to  new 
populations.  A  common  problem  is  for  researchers  to  state  hypotheses  and 
generalizations  in  terms  of  intact  organizational  groups ,  but  to  calculate 
reliabilities  and  estimate  validity  using  individuals  (see  Bowers,  1973;  also 
Passmore,  1976,  and  Torbert,  1973  for  a  critique  of  inconsistent  use  of  units  of 
analysis).  The  researcher  may  estimate  validity  with  groups  but  calculate 
reliabilities  using  individuals  (see  Taylor  &  Bowers,  1972,  p.  54  for  alternation 
between  using  groups  and  individuals  in  calculating  reliabilities). 

The  researcher  who  tries  to  use  units  of  analysis  consistently  by  computing 
reliabilities  on  the  appropriate  group  units,  faces  difficulties  since  an 
adequate  outline  of  procedures  for  estimating  reliability  on  aggregated  scores 
does  not  exist.  Survey  responses  are  aggregated  across  both  items  and  respon¬ 
dents  within  each  group  to  produce  the  dependent  variable  scores.  The  sources  of 


1 


true  and  error  variance  differ  in  these  aggregated  scores  from  the  same  sources 
of  variance  in  individual  level  scores,  since  the  structure  of  the  data  differs 
in  the  two  cases,  and  for  this  reason  the  formulas  for  estimating  reliability  on 
aggregated  scores  can  differ  from  the  common  formulas  used  with  individuals. 
Some  researchers  have  looked  at  inter-item  agreement,  and  others  at  agreement 
between  respondents  within  groups,  but  none  have  examined  both  sources  of 
agreement  in  an  integrated  way.  Researchers  have  looked  at  inter-item  agreement 
by  computing,  for  example,  Cronbach's  alpha  on  either  individuals  or  on  data 
aggregated  over  the  unit  of  analysis  for  each  item  (see  Taylor  &  Bowers,  1972); 
and  at  agreement  between  respondents  by  using  either  a  variation  of  the  intra¬ 
class  correlation  (see  Jones  &  Jones,  1977;  Ebel ,  1951;  Bass  et  al.,  1975)  or  an 
iterative  jacknife  procedure  (Schneider,  1972;  Schneider  &  Bartlett,  1970). 


Estimates  of  construct  validity  (Cronbach  and  Meehl ,  1955)  are  in  many  cases 
dependent  upon  adequate  measures  of  the  reliability  of  the  variables  involved. 
Construct  validity  consists  of  hypotheses  that  make  up  nomological  networks  of 
expected  relationships.  The  expected  relationships  involve  expectations  about 
differential  levels  of  association  among  variables.  Differential  levels  of 
association  are  frequently  studied  using  regression  or  path  analyses,  or  cross- 
lagged  correlation  analysis  (see  Kenny,  1975).  Statistics  that  measure  degrees 
of  association  among  variables  are  a  function  of  the  variables'  reliability  as 
well  as  the  degree  of  association  in  the  population  (McNemar,  1969,  p.  163).  Any 
attempt  to  measure  differential  levels  of  association  must  control  for  differen¬ 
tial  levels  of  reliability,  or  demonstrate  that  differential  levels  of  reliabili¬ 
ty  don't  exist  (Kenny,  1975;  Jbreskog  &  Sftrbom,  1979,  chap.  4).  Failure  to 
calculate  reliabilities  provides  alternate  explanations  for  any  set  of  results. 
In  this  sense,  it  is  not  possible  to  establish  construct  validity  without  taking 
into  account  measurement  error  first,  no  matter  what  method  of  analysis  is  used — 
regression,  path,  or  cross-lagged  panel  correlation.  In  this  way  estimation  of 
validity  is  dependent  on  the  measurement  of  reliability. 

The  purpose  here,  then,  is  (a)  to  provide  criteria  for  selecting  appropriate 
units  of  analysis  within  intact  organizations,  and  (b)  to  provide  the  appropriate 
procedures  for  calculating  internal  consistency  reliabilities  on  the  aggregated 
group  scores.  These  internal  consistency  reliabilities  are  especially  important 
in  studies  of  organizational  change.  They  can  be  used  to  identify  possible 
reliability  shifts  over  time.  Real  organizational  changes  can  then  be  separated 
from  changes  in  measurement  error. 

An  important  advantage  of  using  group  units  over  the  common  approach  of 
using  individuals,  is  that  it  allows  the  researcher  to  study  the  nature  of  the 
social  interaction  that  occurs  between  subgroups  within  the  unit — between  blacks 
and  whites,  superiors  and  subordinates,  parents  and  children — in  a  way  that  is 
not  possible  when  individuals  alone  are  the  unit  (see  Hart,  1978,  to  illustrate 
this  application).  This  is  an  advantage  that  has  not  been  recognized,  even  by 
researchers  with  appropriate  group  data  (see  Taylor  &  Bowers,  1972).  The 
structure  of  the  data  that  allows  interaction  to  be  studied  will  be  illustrated. 


Analysis  of  Variance 


Analysis  of  variance  can  be  used  1‘h  for  reliability  estimation  (see  Winer 
1971,  pp.  283-296;  Myers,  1966,  pp.  794 -299 ;  Ebel,  1951)  and  estimation  of 
synchronization  for  selection  of  units  of  analysis.  The  model  statements  used 
with  aggregated  data  can  be  complex,  involving  many  terms  that  may  ^ary  from 
design  to  design.  For  this  reason  an  analysis  of  variance  algorithim  is  given 
below,  for  balance  designs,  that  is  more  parsimonious  than  that  provided  by  many 
commonly  used  texts  (e.g.,  Winer,  1971,  pp.  371-375),  to  assist  the  reader  with 
subsequent  material  and  to  clarify  terminology  and  notation  that  is  not  complete¬ 
ly  standard. 


Model  Statement 


Main  effect  terms  are  identified  by  a  single  alpha  character  in  caps. 
Nested  relationships,  if  any,  are  identified  by  additional  alpha  characters  in 
brackets  next  to  the  term  in  question,  showing  what  this  term  is  nested  within. 
Interactions  are  denoted  by  two  or  more  alpha  characters  identifying  the  inter¬ 
esting  main  effects.  The  full  rank  model  includes  interactions  between  all 
cccnbinations  of  terms,  excluding,  however,  interactions  between  any  terms  that 
share  a  common  alpha  character.  Terms  are  ordered  by  examining  the  alpha 
characters  denoting  terms.  If  the  alpha  characters  of  one  term  are  a  subset  of 
the  characters  of  another,  the  term  that  is  a  subset  must  be  placed  ahead  of  the 
other.  Nonnested  main  effect  terms  with  a  greater  number  of  other  terms  nested 
within  them  are  listed  ahead  of  the  nonnested  main  effects  with  fewer  other  terms 
nested  within  them. 


Expected  Mean  Squares 

Expected  mean  squares  (EMS)  identify  how  mean  squares  are  divided  into  the 
various  components  that  contribute  to  the  makeup  of  the  mean  square.  Since 
expected  mean  squares  are  essential  for  deriving  reliability  formulas,  the 
following  algorithm  can  be  used  to  derive  expected  mean  squares  in  the  balanced 


This  algorithm,  in  similar  form  but  with  different  notation,  should  be  attribu¬ 
ted,  to  the  author's  knowledge,  to  Dr.  Melvin  Carter,  Department  of  Statistics, 
Brigham  Young  University. 


case.  To  see  whether  the  variance  components  for  other  terms  occur  in  the 


expected  mean  squares  for  the  term  in  question,  the  alpha  characters  of  the  term 
in  question  are  examined  in  relation  to  the  alpha  characters  of  the  other  terms. 
If  the  term  in  question  is  a  subset  of  another  term,  then  the  complement  of  the 
characters  is  taken.  If  all  of  the  nonbracketed  characters  belonging  to  this 
complement  designate  random  factors,  then  the  variance  component  for  this  other 
term  does  occur  in  the  expected  mean  squares.  The  coefficient  for  this  variance 
component,  that  occurs  in  the  expected  mean  squares,  is  found  by  finding  the 
alpha  characters  not  listed  as  part  of  the  term.  The  product  of  the  levels  of  the 
main  effect  terms  not  listed  in  this  way  equals  the  coefficient. 


Sums  of  Squares 

The  sums  of  squares  for  any  balanced  complete-block  design,  can  be  readily 
obtained  by:  (a)  taking  the  sum  over  levels  of  main  effects  not  listed,  for  the 
term  in  question;  (b)  next  squaring  and  then  summing  over  levels  of  main  effects 
that  are  listed;  and  finally,  (c)  this  sum  is  then  divided  by  the  product  of 
levels  of  main  effects  not  listed.  Then  the  sum  of  squares  for  the  term  in 
question  is  obtained  by  subtracting  all  sums  of  squares  of  terms  that  are  subsets 
of  the  term  in  question.  This  includes  the  p  term. 


Degrees  of  Freedom 

Degrees  of  freedom  for  each  term  are  obtained  by  taking  the  product  of  the 
levels  of  the  main  effects  that  are  listed  for  the  term  in  question,  and  then 
subtracting  the  degrees  of  freedom  of  all  terms  that  are  subsets  of  the  term  in 
question.  Again  this  includes  the  p  term. 


Data  Structure 


Overview 

Reliability  estimation  is  dependent  upon  specifying  the  structure  of  the 
data,  which  can  be  identified  with  an  analysis  of  variance  model  statement.  The 
following  analysis  of  variance  model  statement  illustrates  the  type  of  structure 
frequently  encountered  with  survey  data  taken  from  intact  organizational  groups. 
The  model  statement  is  used  to  describe  U.S.  Army  organization,  but  could  equally 
fit  most  organizations,  and  is  used  as  an  example  throughout  the  paper. 
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Y  =  y  +  A  +  B(  A)  +  C(AE)  +  R  +  AR  +  BR(  A)  +  CR(AB)  +  S(ABCR)  +  Q  +  AQ  + 
BQ(  A)  +  CQ(AB)  +  RQ  +  ARQ  +  BRQ(A)  +  CRQ(AB)  +  SQ(ABCR)  +  E(ABCRSQ) 


(1) 


where,  A  =  1 ,  a;  brigade,  random 
B  =  1,  b;  battalion,  fixed 

C  =  1,  c;  company,  fixed  (except  where  explicitly  specified  as  random) 
R  =  1 ,  r ;  race ,  fixed 
S  =  1 ,  subjects,  fixed  or  random 
Q  =  1,  questionnaire  items,  fixed  or  random 
E  =  1 ,  1 ;  error ,  random 


An  Army  company  consists  of  approximately  150  soldiers  who  work  together. 
There  are  five  companies  within  a  battalion  and  three  battalions  within  a 
brigade.  The  hierarchical  nature  of  the  organization  is  specified  by  the 
completely- nested  hierarchical  portion  of  the  design  (A,  B,  and  C) .  Assuming 
enough  units  were  available,  either  brigades,  battalions  or  companies  could  be 
selected  as  the  unit  of  analysis.  Nesting  any  number  of  hierarchical  levels  is 
possible.  The  hierarchical  data  structure  is  a  very  general  one  that  can  be 
applied  to  most  organizations  in  many  societies.  It  can  apply  also  to  genera¬ 
tional  hierarchies  in  groups  organized  along  familial  lines.  Mixed  hierarchies 
can  also  be  examined  with  families  nested  within  the  parental  occupational 
organization (s ) . 


Following  the  hierarchical  part  of  the  design,  the  term  Race  (R)  appears, 
which  crosses  the  hierarchical  groups  (i.e.,  it  is  not  nested- within  than).  This 
crossed  term,  whether  it  designates  a  variable  like  race  (black-white),  or  rank 
(supervisor-subordinate),  or  even  generation  (parent-child),  designates  sub¬ 
groups  that  represent  repeated  measurements  across  the  unit  of  analysis  (e.g., 
companies,  families).  Repeated  measurements  across  the  unit  of  analysis  can  be 
used  to  examine  the  interaction  between  the  subgroups  that  are  repeated,  by 
correlating  the  responses  of  the  subgroup  across  the  units,  and  when  available, 
across  time  using  cross-lagged  panel  correlation  or  path  analysis  (see  Hart, 
1S78).  Interaction  between  subgroups  can  be  examined  over  time  in  this  manner. 
In  addition  to  the  single-crossed  term  Race  (R) ,  other  crossed  terms  designating 
subgroups  with  their  associated  interaction  terms  are  possible,  as  well  as 
covariates  without  interactions. 


The  term  representing  Questionnaire  items  ( Q)  is  crossed  with  both  the 
nested  Subjects  term  (  S)  and  the  hierarchical  terms  (A,  B,  C ) ,  which  means 
questionnaire  items  can  be  considered  repeated  measures  in  two  ways — across  both 
subjects  and  the  unit  of  analysis  (A,  B  or  C).  Just  one  such  term  is  expected, 
representing  survey  items.  Succeeding  terms  represent  interactions  with  Q.  Data 
that  is  repeated  in  both  ways  contain  common-method  variance  (see  Campbell  & 
Fiske,  1959)  not  found  in  data  repeated  only  across  the  unit  of  analysis,  so  that 
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correlations  between  variables  that  are  repeated  in  both  ways  should  be  inflated 
in  relation  to  correlations  based  on  data  that  is  repeated  only  across  the  unit 
of  analysis  and  not  across  subjects.  Data  that  is  repeated  in  two  ways  is 
represented  by  the  ratings  of  a  single  subgroup,  within  the  unit  of  analysis,  on 
two  different  scales,  while  data  that  is  repeated  in  only  one  way  is  represented 
by  ratings  fran  two  different  subgroups  on  two  different  scales.  Methods  of 
reliability  estimation  that  use  the  comr^pnality  between  all  variables  in  an 
analysis  (see  Kenny,  1975,  pp.  897-899;  Joreskog  &  Sorbom,  1979,  chap.  4)  are 
not  appropriate  for  data  structures,  as  above,  in  which  correlations  are  influ¬ 
enced  by  whether  the  variable  is  "repeated"  in  more  than  one  way.  Internal 
consistency  reliabilities  are  preferable  with  the  above  data  structure. 

Overall,  the  model  can  be  considered  a  hierarchical  split-plot  (or  repeated- 
measures)  design.  The  Q  term  and  interactions  with  Q  represent  Wit hi n-Subjects 
variance,  while  the  hierarchical  and  crossed  terms  with  their  interactions 
represent  Behavior-Subjects  variance,  as  found  in  a  split-plot  (repeated- 
measures)  design.  The  between  subjects  variance  can  be  further  divided  into 
two  parts — the  hierarchical  part  representing  Between-Groups  variance,  and  the 
crossed  term(s)  with  their  interactions  representing  Within-Groups  variance — 
thus  creating  the  hierarchical  split-plot  design.  Analysis  of  variance  designs 
like  the  above  generally  have  more  than  one  error  term.  For  example,  the  term  SQ 
can  be  considered  an  appropriate  error  term  to  test  within-sub jects  terms,  and  S 
an  error  to  test  between-sub jects  terms.  Furthermore,  the  hierarchical  terms  (1, 
and  B  might  be  considered  error  terms  under  some  circumstances.  Error  terms  are 
dictated  not  only  by  the  model  but  also  by  the  terms  considered  fixed  and  random. 
The  determination  of  whether  a  term  is  fixed  or  random  depends  on  the  sampling 
plan  of  the  design. 


Sampling  Plans 

In  the  previous  model  statement,  Brigades  (A)  may  have  been  sampled  in  a 
random  or  at  least  representative  fashion,  while  Battalions  (B)  and  Companies  (C) 
may  have  been  sampled  in  an  exhaustive  fashion.  Brigades  may  therefore  be  random 
while  battalions  and  companies  within  brigades  are  fixed  since  the  population  of 
these  units  was  exhaustively  sampled.  In  the  preceding  example  the  nested 
hierarchical  terms  B  and  C  were  fixed,  but  in  rare  cases  such  terms  could  be 
randan.  For  example,  if  countries  were  used  as  a  unit  of  analysis,  and  in  the 
sampling  plan  cities  were  randomly  selected  to  represent  countries,  with  subjects 
randomly  selected  within  cities,  the  nested- hierarchical  terra,  cities,  could  be 
random  as  well  as  subjects. 

The  Subjects  term  (S)  in  the  previous  example,  nested  within  Companies  (C) 
and  Race  ( R) ,  will  be  considered  fixed  or  random  depending  on  how  exhaustively 
the  population  of  subjects  within  companies  is  sampled.  The  subjects  term  is 
fixed  when  all  soldiers  (approximately  150)  are  sampled,  and  random  when  a  very 
small  fraction  of  the  company  population  is  sampled.  The  fixed-random  distinc¬ 
tion  is  determined  by  the  sampling  fraction  (s/N,  sample  size  over  popuation 
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size),  with  terms  fixed  when  the  ratio  is  one  and  random  when  the  ratio  is  zero. 
In  practice,  the  subjects  terms  often  will  be  neither  fixed  nor  random.  The 
company  populations  are  quite  small  and  itfs  not  unusual  at  all  for  a  sampling 
plan  to  call  for  sampling  a  fraction  of  the  population  (e.g.,  1/3)  that 
approaches  neither  one  nor  zero.  In  these  cases,  the  subjects  term  will  be 
labeled  semirandom.  The  Questionnaire  items  (Q)  may  likewise  be  considered 
random  if  the  items  in  the  survey  are  considered  a  random  selection  of  a 
potentially  infinite  population  of  items  measuring  the  same  concept,  or  fixed  if 
the  items  are  considered  to  exhaust  the  population  of  interest. 

Subjects  could  be  considered  random  or  semirandom  and  items  fixed  in  a 
cross-lagged  correlation  design  using  groups  as  the  unit  of  analysis  (see  Hart, 
1978).  In  this  design,  a  sample  of  subjects  within  companies  can  be  selected  to 
represent  the  whole  company  population,  so  subjects  are  random  or  semirandom. 
Cross-lagged  correlation  looks  at  time-related  changes  assuming  stationarity — 
constant  item  structure  over  time  (Kenny,  1975).  In  such  cases  it  may  often  be 
reasonable  to  assume  items  are  fixed  when  looking  at  time-related  changes  in  this 
way.  Likewise,  subjects  can  be  considered  fixed  and  items  random  in  most  single¬ 
time,  survey-feedback  designs.  In  this  case,  entire  company  populations  are 
frequently  sampled,  while  items  are  considered  a  sample  of  a  larger  conceptual 
population.  In  this  sampling  plan  subjects  become  fixed  and  items  random.  Of 
course,  in  many  designs  both  subjects  and  items  may  be  random  or  at  least 
semirandom. 


Reliability  Formulas 


Derivation 

The  sampling  plans  given  above  have  a  direct  impact  on  the  appropriate 
reliability  formulas.  A  requirement  for  measuring  reliability  is  to  divide  the 
variance  associated  with  the  unit  of  analysis  into  true  and  error  components. 
The  unit  of  analysis  in  this  case  is  an  aggregated  group  score  instead  of  an 
individual  response.  If  the  unit  of  analysis  is  the  Companies  term  (C),  the 
expected  mean  squares  for  this  term  show  the  underlying  components  that  are 
expected  in  the  make-up  of  the  observed  mean  square.  These  underlying  components 
can  be  divided  into  true  and  error  variance.  This  provides  a  way  of  allocating 
the  observed  company  mean  square  into  true  and  error  components.  The  sampling 
plan  determines  which  terms  are  fixed  and  random.  This  in  turn  affects  the 
expected  mean  squares  for  the  unit  of  analysis  and  the  allocation  of  true  and 
error  components  to  the  observed  mean  square,  which  then  affects  the  reliability 
formula.  Table  1  shows  how  the  expected  mean  squares  in  the  balanced  case 
change,  for  selected  terms,  as  a  function  of  whether  Subjects  (S)  and  Question¬ 
naire  items  (Q)  are  considered  fixed  or  random.  Reliability  is  defined  as  the 
ratio  of  true  to  total  variance.  The  variance  component  defined  as  true  variance 
is  always  that  component  associated  with  the  unit  of  analysis — in  this  case 
either  Companies  {C ) ,  Battalions  (B) ,  or  Brigades  (A).  As  indicated  by  Table  1 
there  is  more  than  one  "error"  term  when  both  items  and  subjects  are  random.  In 
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Table  1 


Balanced  Expected  Mean  Squares  with  Fixed/Random 
Subjects  (S)  and  Items  (Q)"' 


Term 


A  brigade 


Expected  Mean  Squares 


bcrsqj*  +  rqa*)  +  (bcrsa*  )  +  (p*  ) 


B(A)  battalion 

crsqP*  + 

(ao|)  ♦ 

(orso^: 

C(AB)  company 

rsga£  + 

(ao|)  * 

<E“ro: 

S(ABCR)  subjects 

SPs  * 

AQ  brigade  X  items 

bcrsa? 

BQ( A)  battalion  X  items 

CQ(AB)  company  X  items 

SQ(ABCR)  subjects  X  items 

“4>  - 


(°SQ^  *  aE 


tai0!  ♦  °e 


«*  +  cte 


The  model  and  notation  are  found  in  the  text  (see  Equation  1).  The  term  A  is 

random  with  B  and  C  fixed.  Subjects  (S)  and  Questionnaire  Items  (Q)  are  either 

fixed  or  random.  Lower  case  letters  denote  the  number  of  levels  of  the 

corresponding  factors  in  caps. 

2 

When  subjects  are  fixed,  terms  within  brackets  are  deleted.  When  question¬ 
naire  items  are  fixed,  terms  within  parentheses  are  deleted. 
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general,  as  the  number  of  main  effects  following  the  unit  of  analysis,  that  are 
random,  increase,  the  number  of  components  considered  to  be  error  increase 
dramatically,  (see  Formula  11,  Table  2). 

Reliability  for  the  group  mean  scores  is  formally  defined  in  Table  2.  The 
expected  mean  squares,  shown  in  Table  1,  for  the  unit  of  analysis  (C) ,  are  divided 
by  rsq,  the  product  of  the  levels  that  are  added  to  obtain  the  group  means.  The 
divided  expected  mean  squares  represent  the  components  expected  in  the  group 
means,  components  that  vary  according  to  the  sampling  plan.  The  component  due  to 
the  unit  of  analysis  (C ) ,  divided  by  all  components,  represents  the  ratio  of  true 
over  total  variance  needed  for  the  reliability  definition.  Mean  square  terms  are 
set  equal  to  the  corresponding  expected  mean  squares,  and  then  the  equations  are 
solved  for  the  variance  components.  For  example,  the  variance  components  for 
definition  3  in  Table  2  equal: 

o2  =  (MS^  -  MSg)  /  rs£;  £0*  +  a*  =  MSg. 


The  mean  square  estimates  of  the  variance  components  are  substituted  for  the 
corresponding  variance  component  in  the  reliability  definition,  and  then  simpli¬ 
fied  algebraically.  This  process  produced  the  reliability  formulas  in  Table  2. 


The  unit  of  analysis  for  Formulas  (3)  through  (10)  is  Companies  ( C) .  When 

the  unit  is  Battalions  (B)  or  Brigades  (A),  the  definitions  and  reliability 

formulas  are  the  same  as  in  Table  2,  with  the  following  substitutions: 

(a)  a 2  becomes  a*,  or  a2.;  (b)  o2  becomes  a2,  or  a*  ;  (c)  MS^  becomes  MSn,  or 
£  BA/  £2.  BQ  AQ  — — C  — B 

MSa;  and  (d)  MS^q  becomes  or  M£Aq*  When  the  unit  of  analysis  is  Battalions 

(B) ,  the  terms  including  B  are  substituted,  and  when  the  unit  is  Brigades  (A),  A 
is  substituted.  The  error  terms  in  the  denominator  of  the  reliability  defini¬ 
tions  are  divided  by  an  additional  coefficient  c  for  Battalions  and  be  for 
Brigades. 


Estimating  reliability  involves  estimating  ratios  of  variance  components. 
The  expectation  of  these  ratios  contains  a  slight  positive  bias.  Winer  (1971, 
pp.  248-249;  282-290)  has  given  a  correction  for  this  bias  for  the  standard 
formulas  (Formula  2,  Table  2;  Formula  2b,  Table  4).  This  correction,  when 
extended  to  any  of  the  formulas  in  Table  2,  has  the  following  form: 


MS  .. 
— unit 


-  (iLprnr*/  (^Lrm.  -  2) 
v — error  — error 


MS  ) 
— error J 


MS  . . 
— unit 


(12) 


where,  MS^n^^  is  the  mean  square  for  the  unit  of  analysis,  MScrror  represents  the 

mean  square  term(s)  measuring  error.  The  term(s)  subtracted  from  MSg  in  the 

numerator  of  the  formulas  in  Table  2  are  error.  In  words,  the  correction 
involves  multiplying  MS  by  a  correction  term  that  approaches  one  as  the 

degrees  of  freedom  for  error  increase.  When  MS  involves  more  than  one  mean 

— error 
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square  term,  the  adjusted  degrees  of  freedom  for  these  several  terms  are  found  by 
referring  to  Formula  (24)  given  later.  For  all  practical  purposes  the  positive 
bias  in  the  reliability  formulas  in  Table  2  is  negligible  with  as  many  degrees  of 


freedom  for  MS 


srror 


as  is  customary  with  organizational  surveys. 


Another  bias  may  be  more  serious.  As  with  any  analysis  of  variance  design, 
if  significant  terms  are  omitted  from  the  model  statement,  these  omitted  terms 
will  artificially  inflate  MS  .  Reliability  will  be  underestimated  to  the 

extent  significant  terms  are  omitted  from  the  model  statement.  For  example, 
omitting  Race  (R)  when  it,  or  its  interactions,  are  significant,  increases  the 
size  of  MS„.  It  is  desirable  to  specify  model  statements  that  capture  the 

O 

structure  of  the  data  as  completely  as  possible  even  if  this  creates  model 
statements  with  large  numbers  of  terms. 


Interpretation 

The  reliabilities  are  internal  consistency  measures  of  reliability.  As  such 
they  represent  reliability  at  any  one  discrete  point  in  time.  At  this  point  in 
time  the  reliabilities  measure  the  extent  to  which  the  researcher  would  expect  to 
obtain  the  same  thing  if  the  measurement  process  were  repeated.  They  estimate 
the  correlation  between  the  mean  scores,  for  the  unit  of  analysis,  and  another 
set  of  mean  scores  that  would  be  expected  if  the  measurement  process  had  been 
repeated  at  the  same  time.  The  reliability  would  also  be  considered  an  estimate 
of  the  correlation  between  the  observed  sample  means  and  the  means  that  would 
have  been  obtained  if  the  entire  population  of  subjects/items  had  been  measured. 


The  sampling  plans  differ  for  different  reliability  formulas.  Sampling  is 
conducted  without  replacement  (i.e.,  no  respondent  takes  the  survey  twice  at  one 
time)  which  creates  the  practical  effect  of  sampling  from  a  population  that  can 
be  considered  finite.  When  subjects  are  fixed,  the  "observations"  that  make  up 
the  variation  due  to  subjects  a*,  remain  the  same  in  the  hypothetical  new  sample 


as  they  were  in  the  observed  sample,  and  when  subjects  are  semirandom  the 
proportion  of  these  elements  in  each  group  that  remain  the  same  equals  s  /  N_ 

(sample  over  population  size).  Likewise,  when  items  are  fixed,  the  "observa¬ 
tions"  due  to  the  component  a*  are  identical  in  the  observed  and  hypothetical 


new  sample,  and  in  the  semirandom  case  the  proportion  of  elements  that  are  the 


same  equals  £  /  N^. 


When  the  sample  size  equals  the  population  size  (i.e.,  the 


term  is  fixed),  the  same  scores  are  selected  twice,  the  mean  scores  are  measured 
without  error,  and  the  reliability  is  perfect.  When  a  term  is  semirandora,  the 
hypothetical  new  sample  will  contain  n  /  N  elements  in  common  with  the  old  sample 
and  the  population.  When  a  terra  is  random,  none  of  the  elements  that  make  up  that 
component  remains  the  same  in  the  new  sample  or  population.  Declaring  a  term 
fixed  or  random,  then,  is  the  same  as  assuming  the  elements  that  go  into  a 
particular  variance  component  either  change  or  do  not  change  from  the  observed 
sample  to  a  hypothetical  new  one  or  to  the  population.  They  do  not  change  if  the 
sample  size  equals  the  population  size. 
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Relationship  Between  Formulas 

In  fact,  there  is  a  close  connection  between  average  intercorrelation,  and 
reliability  as  computed  by  Cronbach's  alpha,  and  analysis  of  variance. 
Cronbach's  alpha  is  identical  to  the  Spearman-Brown  prediction  formula  applied  to 
the  average  intercorrelation  between  items  (see  Ebel,  1951).  Formula  1  in 
Table  2  differs  from  Cronbach's  alpha  only  in  that  analysis  of  variance,  with  its 
attendant  assumptions,  is  used  to  estimate  the  average  intercorrelation  between 
items  (see  Formula  26,  Table  4).  This  estimate  of  the  average  intercorrelation 
(Formula  26),  when  corrected  by  the  Spearman-Brown  prediction  formula,  equals 
Formula  2. 

When  computing  reliability  for  aggregated  scales  researchers  typically 
compute  Cronbach's  alpha  on  group  means,  computed  separately  for  each  item,  which 
is  the  same  as  computing  the  average  inter correlation  between  these  item  means, 
and  adjusting  the  average  correlation  with  the  Spearman-Brown  prediction  formu¬ 
la.  This  is  closely  approximated  by  Formula  5,  Table  2.  The  average  inter¬ 
correlation  between  company  mean  scores  for  each  item  is  estimated  by  Formula  27, 
Table  4.  When  this  analysis  of  variance  estimate  of  the  average  intercorrelation 
is  corrected  by  the  Spearman-Brown  prediction  formula  it  equals  Formula  5.  The 
use  of  Cronbach's  alpha  to  estimate  the  reliability  of  group  mean  scores  requires 
the  same  sampling  assumptions  as  does  Formula  5 — subjects  fixed  and  items  random. 
When  subjects  are  sampled  from  large  intact  organizational  groups,  Formula  5  is 
not  appropriate  and  neither  is  Cronbach's  alpha.  For  example.  Taylor  and  Bowers 
(1972)  used  Cronbach's  alpha  both  on  exhaustive  and  ten  percent  samples  of 
subjects.  Formula  5  should  have  given  way  to  Formula  8  with  the  ten  percent 
sample  if  .the  assumption  of  random  items  had  been  made. 

A  comparison  of  Formulas  (2)  and  (3),  Table  2,  shows  an  interesting 
relationship  between  variance  components.  When  individuals  are  used  as  the  unit 
of  analysis,  the  between  subjects  variance  a|  represents  true  variance, but  when 

iJ 

companies  are  the  unit,  and  subjects  are  random,  as  in  Formula  3,  the  terms  cr* 

O 
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represents  error  variance.  It  is  true  that  the  subjects  components  are  not 

O 

identical  in  the  two  cases  since  the  models  differ,  but  they  are  very  similar. 
The  subjects  mean  square  ( M5„ )  in  Formula  3  has  been  reduced  compared  to  the 


subjects  mean  square  MS<,  in  Formula  2,  to  the  extent  that  other  "between 

subjects"  terms  from  the  model  in  Equation  1  are  significant,  but  otherwise  the 
terms  are  the  same.  Maximizing  the  variance  between  subjects  will  increase 
reliability  as  measured  by  Formula  2,  but  can  decrease  it  as  measured  by 

Formula  3*  In  constructing  the  Survey  of  Organizations  (see  Taylor  &  Bowers, 
1972),  "between  subjects"  variance  was  maximized  by  such  techniques  as  (a) 
positive  wording  of  all  questions,  (b)  contiguous  placement  of  items  from  the 
same  scale,  (c)  positive  response  alternatives  lined  up  on  the  same  side  of  the 
scale,  and  (d)  selection  of  items  with  large  "between  subjects"  distributions. 
These  techniques  will  maximize  reliability  as  measured  by  Formula  2.  The 
previous  techniques  seem  to  maximize  subject  differences  by  increasing  variance 
due  to  response  sets.  If  this  is  the  case,  this  subject  variance  would  be 
expected  to  inflate  MS^  as  error  in  Formula  3*  It  is  possible  that  these 

techniques  also  reduce  o|  so  it  may  not  always  increase  MSg  as  error.  In 

Formula  3  we  wish  to  maximize  MS^-.  in  relation  to  MS^ .  The  preceding  technique 

used  in  Survey  of  Organizations  could  easily,  but  not  necessarily,  increase  MS 

in  relation  to  MS^,,  reducing  reliability.  Since  the  Survey  of  Organizations 

and  others  like  it,  use  intact  organizational  groups  as  units,  Formula  3  rather 
than  2  is  most  appropriate  and  should  be  used  when  subjects  alone  are  random. 


Formulas  2  and  5  have  generally  been  used  to  establish  reliability  for 
organizational  surveys.  It  should  be  apparent  from  Table  2  that  there  is  no 
necessary  relationship  between  reliability  as  measured  by  Formula  5  and  3. 
Furthermore,  there  may  sometimes  be  a  negative  relationship  between  reliability 
as  measured  by  Formula  2  and  3*  Organizational  Surveys  that  claim  to  have  well 
established  reliabilities,  using  Formulas  2  or  5,  have  not  established  reliabili¬ 
ty  at  all  for  the  situations  in  which  Formulas  3,  7,  8,  9  or  10  are  most 
appropriate.  In  fact,  it  is  reasonable  to  suppose  that  many  of  these  "well 
established  reliabilities"  will  not  prove  to  be  reliable  at  all  as  measured  by 
Formula  3,  since  no  attempt  has  been  made,  using  pretest  samples  to  select  items 
that  discriminate  well  between  group  units,  while  a  corresponding  effort  has  been 
made  to  find  items  that  have  high  intercorrelations.  It  is  important  to  find 
which  scales  are  in  fact  reliable  using  appropriate  formulas.  Research  in  this 
direction  may  require  a  reassessment  of  the  reliabilities  of  the  scales  used  in 
organizational  research,  as  well  as  interpretations  of  results  in  this  area. 


Reliability  for  Record  Data 


Frequently  variables  representing  group  units  of  analysis  are  not  measured 
by  survey  but  can  be  found  in  the  form  of  frequency  counts  of  events  within  the 


group,  that  occurred  during  a  given  time  period.  Often  these  frequency  counts  are 
expressed  in  the  form  of  rates  (e.g.,  per  1000)  or  percentages.  The  use  of  rates 
or  percentages  is  generally  not  a  good  idea  when  the  variables  are  to  be 
correlated,  since  this  creates  the  attendant  problems  of  index  correlation  (see 
McNemar,  pp.  180-182).  A  better  approach  is  to  use  the  raw  frequency  counts,  and 
partial  out  the  effects  of  sample  size  (Cronbach  &  Furby,  1970).  Reliability  for 
such  frequency  counts  can  be  computed  using  analysis  of  variance,  with  the  group 
size  variable  used  as  a  covariate.  The  model  in  this  case  differs  slightly  from 
that  shown  in  Equation  (1).  The  following  model  defines  the  structure  of  the 
data  in  the  case  with  three  levels  of  hierarchy: 

Y  =  A  +  B( A)  +  C(AB)  +  D  +  At  +  BD( A)  +  CD(AB)  +  E(ABCD)  (13) 


where ,  A  =  1 ,  a 
8  =  1,  b 
C  =  1,  c 
D  =  1 ,  d 


brigade,  random 
battalion,  fixed 
company,  fixed 

generally  dichotomous  split  of  frequencies,  random 


E  =  1 ,  1 ;  error,  random 


The  addition  of  another  crossed  term  like  Race  (R) ,  that  is  fixed,  does  not  affect 
the  reliability  definition  or  formula,  so  it  was  omitted.  In  addition  to  the 
above  model  the  group  size  variable  can  be  added  as  a  covariate.  The  term  D  can 
represent  either  a  random  dichotomous  split,  or  a  dichotomous  split  that  controls 
for  a  variable  like  time  (e.g. ,  one  level  represents  events  that  occurred  on  odd 
numbered  days  and  the  other  level  events  that  occurred  on  even  numbered  days  for 
the  time  period  in  question).  The  split  may  have  to  be  random  when  the  time 
variable  is  not  available  on  a  case  by  case  basis.  The  fact  that  a  random  split 
is  possible  means  that  an  internal  consistency  reliability  can  be  computed  when 
only  frequency  counts'  are  available  for  each  group.  Researchers  often  assume  it 
is  not  possible  to  compute  reliability  in  this  case.  The  reliability  definition 
and  formula  are  given  as  follows: 


V  '  ± 


When  random  splits  within  groups  are  necessary  to  obtain  the  observations  for  the 
term  D,  greater  stability  in  the  reliability  estimates  can  be  obtained  by  a 
jacknife  procedure  in  which  MS^  in  Formula  (14)  is  estimated  several  times  using 

different  random  splits  each  time.  The  different  estimates  can  then  be  averaged 
prior  to  using  the  averaged  estimate  in  Formula  (14).  When  the  term  D  is  fixed 
the  record  variable  in  question  is  considered  to  be  measured  without  error  and  an 
estimate  of  reliability  is  not  needed.  This  would  occur  if  (a)  the  researcher 


was  willing  to  limit  generalizations  to  that  particular  variable  alone,  and  (b) 
the  frequencies  of  that  variable  were  a  census  rather  than  sample  of  the  relevant 
events. 


Significance  Tests 


Difference  of  Reliability  from  Zero 


It  is  important  to  ask  if  it  is  possible  to  detect  a  significant  amount  of 
true  variance  at  all,  i.e.,  is  the  reliability  coefficient  significantly  differ¬ 
ent  from  zero.  One  form  in  which  this  test  can  be  made  is  to  compare  total  to 
error  variance,  forming  an  F  ratio,  to  see  if  a  detectable  amount  of  true 
variance  exists.  The  form  of  the  F  test  differs  slightly  from  the  reliability 
ratio  (true  over  total  variance),  but  provides  a  test  with  the  same  components. 
The  Test  definitions  and  F  tests  for  reliability  Formulas  3  through  10  are  shown 
in  Table  3.  The  error  terms  in  the  denominators  of  the  F  ratios  in  Table  3  can  be 
found  in  different  form  as  the  quantity  subtracted  from  MS^,  in  the  numerator  of 

the  reliability  formulas  in  Table  2.  The  error  terms  are  expressed  in  different 
form  in  Table  3  because  tests  (17)  through  (23)  are  quasi-F  tests,  i.e.,  tests 
involving  more  than  two  mean  square  terms  in  the  F  test.  In  this  case,  the  F  test 
is  an  approximation  which  is  obtained  by  adjusting  the  degrees  of  freedom  for 
both  the  numerator  and  denominator  separately,  by  the  formula  given  in 
Satterthwaite  (1946): 


df  adj.  =  (a1 (MS1 )  +  a2  (MSg)  +  ...)2 

(a^MS^)2  (a2(MS2))2... 

- +  - 


(24) 


where,  and  MS2  are  independent  mean  squares,  and  a ^  and  a2  are  the 

coefficients  for  the  mean  squares.  The  mean  squares  in  Table  3  are  shown  in  a 
form  that  gives  separate  coefficients  for  each  mean  square  as  required  by 
Formula  24.  In  the  case  where  group  size  is  unbalanced,  and  the  coefficients, 
au,  vary  from  company  to  company,  the  quantity  a^  MS^  can  be  obtained  most 

accurately  by  weighting  individual  scores  as  appropriate  (e.g.,  Formula  42,  as 
described  later). 


Difference  Between  Reliabilities 


In  some  situations  it  is  important  to  know  whether  reliabilities  are 
significantly  different  from  each  other.  For  example,  using  cross-lagged  panel 
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correlation  (Kenny,  1975),  it  is  important  to  know  whether  reliability  changes 
over  time.  When  reliability  changes,  corrections  for  reliability  shifts  are 
made.  A  statistical  test  for  reliability  shifts  is  desirable  and  can  be  made 
when  the  reliabilities  are  expressed  in  the  form  of  F  ratios  as  shown  previously 
in  Table  3i  and  the  assumption  is  made  that  the  mean  square  terms  are  indepen¬ 
dent.  In  the  case  where  measurements  are  made  on  group  units  at  more  than  one 
point  in  time,  with  different  subjects  sampled  on  each  occasion,  the  samples 
involve  the  same  group  populations  but  different  subjects.  In  analysis  of 
variance  terms,  the  measurements  are  repeated  across  companies,  but  not  across 
subjects.  The  mean  square  terms  under  these  conditions  approximate  independence. 
The  bias  due  to  lack  of  independence  is  loss  of  power.  Degrees  of  freedom  are 
large  enough  so  that  power  is  not  low  in  any  case.  Following  Winer  (1971,  pp. 
245-247),  hypotheses  related  to  the  equality  of  two  F  ratios  can  be  tested  as 
follows : 


F  >  (F  )  (F  (df  numerator,  df  denominator))  (25) 

where,  F^  and  F^  represent  reliabilities  in  the  form  of  F  ratios  as  shown  in 

Table  3;  F,  representing  the  larger  F  ratio  and  F„  the  smaller.  To  obtain  F  , 

Lt  o  —ex 

the  degrees  of  freedom  in  the  numerator  and  denominator  should  correspond  to 

degrees  of  freedom  in  the  numerator  and  denominator  of  F^  and  F<,.  The  degrees  of 

freedom  for  F^  should  approximately  equal  those  for  Fg  for  the  test  to  be  valid. 

When  quasi-F  ratios  are  used,  the  degrees  of  freedom  for  F^  should  correspond 

to  adjusted  degrees  of  freedom  as  given  in  Equation  (24).  The  test  should  be 
used  with  some  caution  with  quasi-F  ratios. 


Sample  Size  Requirements 


Organizational  research  is  costly  and  time  consuming.  For  these  reasons,  it 
is  important  to  be  able  to  estimate  ahead  of  time  the  sample  sizes  needed  to 
obtain  specified  levels  of  reliability  desired  by  the  researcher.  How  many 
subjects  within  each  group,  and  how  many  items  in  a  scale  are  needed  to  obtain  a 
specified  level  of  reliability,  say  .75,  as  measured  by  the  formulas  in  Table  2? 
Estimates  of  the  mean  square  terms  in  Table  2  can  be  obtained  from  a  pretest 
sample,  and  from  the  pretest  sample  the  number  of  subjects  and  items  that  are 
needed  for  a  specified  level  of  reliability  can  be  estimated. 

The  way  this  problem  has  been  solved  in  the  standard  case  where  individuals 
are  the  unit  of  analysis,  has  been  to  estimate  the  reliability  of  a  single  score 
(Formula  26,  Table  4)  which  is  related  to  the  reliability  of  the  average  score 
(Formula  2,  Table  2)  in  terms  of  the  Spearman-Brown  prediction  formula.  Solving 
the  Spearman-Brown  prediction  formula  for  the  sample  size,  tells  how  many  items 
must  be  added  to  obtain  the  desired  reliability  (see  Winer  1971,  p.  287).  This 
same  approach  was  used  in  Table  4  for  other  formulas.  However,  when  the  unit  of 
analysis  involves  a  group,  the  reliability  of  single  scores  involves  contingen¬ 
cies:  the  reliability  of  a  single  item  given  the  same  number  of  subjects  as  was 
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Reliability  Formulas  for  Single  Scores  as  a  Function  of  Unit  Of  Analyses  and  Sampling  Plan 


?f> 


found  in  the  pretest  sample,  or  the  reliability  of  a  single  subject  given  the 
same  number  of  items  as  found  in  the  pretest  questionnaire.  Given  these 
contingencies,  the  formulas  in  Table  4  are  related  to  the  corresponding  formulas 
in  Table  2,  in  terms  of  the  Spearman-Brown  prediction  formula.  The  corresponding 
formulas  are  those  with  the  same  unit  of  analysis  and  sampling  plan.  As  shown  in 
Table  4,  sample  size  can  then  be  found  from  the  Spearman-Brown  formula.  Formula 
(28)  and  the  Spearman-Brown  formula  can  be  expressed  in  more  convenient  form  by 
solving  (28)  in  terms  of  the  F  ratio,  F  =  MS^/MS„ ,  and  substituting  this  into  the 

Spearman-Brown  formula.  The  number  of  subjects  needed  in  each  group  (s_  )  can 
then  be  found  as  follows: 


l  Cl  -  Rw)  +  Rw  -  1 


where,  Rw  equals  the  reliability  desired,  s^  the  sample  size  in  each  pretest 


group  and  F  =  MS^  /  MS  . 


The  problem  with  using  formulas  (27)  through  (3D  to  estimate  sample  size 
requirements  is  that  the  number  of  subjects  needed  ( g )  can  only  be  estimated, 

given  that  the  number  of  items  to  be  used  in  the  final  questionnaire  (c^)  equals 

the  number  of  items  (g^ )  in  the  pretest  sample.  The  number  of  items  needed  in  the 

questionnaire  (g^)  can  only  be  estimated,  given  that  the  number  of  subjects  to  be 

used  in  the  final  sample  ( s 2)  equals  the  number  used  in  the  pretest  (s 1 ) .  Also, 

if  the  unit  of  analysis  is  at  a  higher  level  than  companies,  the  pretest  sample 
must  be  assumed  to  have  the  same  subordinate  group  structure  as  in  the  final 
sample.  Another  serious  problem  is  that  the  preceding  approach  does  not  work  for 
some  formulas — when  subjects  or  items  are  semirandom.  There  are  problems  with 
the  concept  of  a  single-score  reliability  in  the  semirandom  case. 

The  sample  size  requirement  problem  was  solved  for  all  formulas  without  any 
contingencies,  by  estimating  variance  components  from  pretest  data  independently 
of  the  number  of  subjects  or  items  in  the  pretest,  substituting  the  sample  sizes 
desired,  s,,,  g2,  for  Pretest  coefficients  s^  and  ,  where  they  appeared  in  the 

reliability  definitions,  and  then  solving  for  s2  and  g2<  The  required  formulas 

are  shown  in  Table  5.  From  Table  5,  the  number  of  subjects  or  items  required  for 
any  formula  in  Table  2  can  be  estimated  from  pretest  data  without  any  contingen¬ 
cies.  For  example,  a  researcher  can  estimate  the  number  of  subjects  required 
(s2) ,  given  that  X  number  of  items  are  added  to  a  scale  over  what  existed  in  the 

pretest.  Similarly,  the  number  of  items  (g2)  can  *3e  estimated,  given  that  the 

sample  size  within  each  group  in  the  final  sample  is  larger  than  it  was  in  the 
pretest.  Of  course,  the  assumption  is  made  that  the  items  that  are  added  are 
intercorrelated  together  to  the  same  degree  as  pretest  items  above,  and  subjects 
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Table  5 


Formulas  for  Determining  Sample-Size  Requirements 
from  Pretest  Data 


Reliability  Sample  Size  Formulas 

Formula  Number  of  Subjects  Number  of  Items 

Defining  Rwa 


3  s2  =  A/C  - 

4  s2  =  A/(C  +  H)  - 

5  -  g.2  =  B/D 

6  -  a2=  £/(D  +  V 

7  s2  +  A/(E  -  G)  a2  =  B/(E  -  F) 

8  s2  =  A/(E  -  G  +  H)  q_2  =  B/(E  -  F  +  H) 

9  s2  =  A/(E  -  G  +  I)  q_2  =  B/(E  -  F  +  I) 

10  s_2  =  A/(E  -  G  +  H  +  I)  a2  =  B/(E  -  F  +  H  +  I) 


Note.  A 

B 

C 


=  -  <a2  -  a,)/a2  m§sq) 

*  (i%  -  <s2  - 

•  25c  0  -  V  -  !Jssn  -  £w> 


Formula 

Number 


(32) 

(33) 

(34) 

(35) 

(36) 

(37) 

(38) 

(39) 


D 

E 


(1  -  V  -  i^CQ  <’  -  V 


(1  -  B)  -  MSS  (1  -  Ru)  -  MS^ 


1  -  V/i2  <H§g  -  MSso) 

2  =  Wa2  <«%o  -  «5sq) 

H  »  2,/s,  (MSg  -  MSjg) 

I  =  2,/Sj  («%Q  -  «sq) 


(1  -v  *«§so('  -£„) 
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Table  5  (continued) 


E  is  the  value  of  the  reliability  that  the  researcher  wants  to  obtain  in  a  new 

— w 

sample.  The  symbol  s^  refers  to  the  number  of  subjects  within  each  group  that  is 
needed  to  obtain  the  desired  reliability  R  ,  while  s,  is  the  pretest  sample  size 
within  each  group.  Similarly,  refers  to  the  number  of  items  needed  to  obtain 
the  stated  R^,  while  3^  is  the  number  of  items  in  the  pretest.  Ns  is  the 
population  size  within  each  company,  while  is  the  size  of  the  population  of 
items.  The  mean  square  terms  are  based  on  the  pretest  data  using  the  original 
model  given  in  Formula  (1).  The  assumption  that  ct|q  =  0  must  be  made  for  Formu¬ 
las  (32),  (33),  (34),  and  (35).  When  A  or  B  is  the  unit  of  analysis  MS.  or  MS0  is 
substituted  for  MS^ ,  and  MS^  or  MS^  for  MS^. 

aThe  numbers  refer  to  the  reliability  formulas  found  in  Table  2. 


I 


added  discriminate  between  groups  to  the  same  extent  as  in  the  pretest.  The 
Formulas  in  Table  5  can  be  used  for  any  of  the  units  of  analysis  A,  B  or  C, 
without  contingencies,  using  the  appropriate  substitutions  given  in  this  table. 

Adding  items  to  a  survey  scale  will  increase  reliability  as  defined  by 
Formulas  (3)  and  (4),  only  to  a  limited  extent  (i.e.,  increasing  the  coefficients 
of  ai  and  a1-  in  relation  to  a*) ,  and  likewise  increasing  the  number  of  subjects 

will  increase  Formulas  (5)  and  (6)  only  to  a  limited  extent  (i.e.,  increasing  the 
coefficients  of  a*  and  aAA  in  relation  to  a£).  Therefore,  it  is  not  meaningful 

0  OW  _E 

to  solve  the  equations  for  items  (£,,)  for  Formulas  (32)  and  (33),  or  for  subjects 

(s^)  for  formulas  (34)  and  (35).  Negative  estimates  from  any  of  the  formulas  in 

Table  5  mean  an  infinity  of  subjects  or  items  would  be  needed  to  obtain  the 
requisite  reliability,  i.e.,  the  desired  level  of  reliability  can't  be  obtained 
by  adding  to  the  sample  size. 


Unbalanced  Designs 


Effects  on  Formulas 


The  derivation  of  all  the  previous  formulas  has  been  based  on  the  assumption 
|  of  a  balanced  design,  i.e.,  equal  sample  and  group  sizes  across  levels  of  all 

factors.  This,  of  course,  rarely  occurs  in  intact  organizations  that  are  of 
interest  here.  The  impact  of  unbalanced  designs  on  the  expected  mean  squares, 
for  the  model  at  Equation  (1),  is  shown  in  Table  6.  When  balanced  formulas  are 
used  to  calculate  the  mean  squares  for  the  model  at  Equation  1  when  the  model  is 
not  balanced,  the  resulting  mean  squares  contain  elements  of  variance  components 
from  a  variety  of  extra  terms.  A  comparison  of  Table  6  and  1  shows  additional 
components  or  elements  of  these  components,  added  by  unbalance.  How  the 
confounding  is  handled  depends  entirely  on  the  hypotheses  being  tested.  For 
purposes  of  reliability  estimation,  researchers  do  not  wish  to  generalize  to 
hypothetical  organizations  in  which  groups  are  all  the  same  size,  with  equal 
numbers  of,  say,  blacks  and  whites  in  each.  Such  a  balanced  hypothesis  is 
clearly  irrelevant  and  inappropriate  for  intact  organizations.  Generalizations 
are  made  to  the  intact  organization  where  subgroups  vary.  In  the  intact 
organization  the  crossed  term  Race  (R)  and  the  subordinate  hierarchical  terms 
B(A) ,  and  C( AB)  are  fixed.  When  these  terms  are  all  fixed,  it  is  appropriate  to 
consider  all  confounded  elements  added  by  imbalance  to  the  "between  people" 
components  of  MSA,  MSp,  or  MSC  as  true  variance,  since  that  sort  of  confounding 

exists  naturally  in  the  intact  organization  to  which  generalizations  are  being 
made.  However,  when  questionnaire  items  (Q)  are  considered  random,  all  confound¬ 
ed  elements  added  by  unbalance  to  the  "within  people"  components  of  MS^,  MS^  or 

MSg  can  best  be  considered  error.  These  confounded  elements  all  represent 

interactions  with  the  random  term  Q.  Since  Q  is  random,  items  change  from  one 
sample  to  another,  and  so  would  interactions  with  Q,  which  suggests  these 
confounded  elements  should  be  considered  error.  When  the  preceding  allocation  of 
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Unbalanced  Expected  Mean  Squares 


confounded  elements  is  made  between  true  and  error  variance,  the  reliability 
formulas,  tests,  and  sample  size  requirements  given  previously  in  Tables  2,  3,  4 
and  5  remain  unchanged.  However,  it  should  be  recognized  that  reliability  and 
test  definitions  contain  additional  confounded  elements  as  shown  in  Table  6. 

An  additional  problem  remains  for  hypothesis  testing  with  unbalanced  de¬ 
signs.  Mean  square  terms  are  no  longer  independent — an  assumption  required  for 
numerators  and  denominators  of  F  tests.  Tests  should  be  made  with  caution  when 
unbalance  is  severe.  This  problem  is  not  unique  to  reliability  estimation,  and 
is  frequently  encountered  in  unbalanced  analysis  of  variance  designs. 


Weighting  Scores 

Unbalanced  designs  and  sampling  requirements  often  necessitate  weighting 
individual  scores  in  order  to  appropriately  estimate  reliability.  Since  sample 
size  affects  reliability,  as  shown  previously,  weights  must  be  applied  In  a  manner 
that  does  not  affect  the  total  sample  size.  Weights  are  appropriate  in  the 
following  three  situations. 


First,  using  a  stratified  sampling  plan,  the  crossed  term  Race  (R)  might  not 
be  sampled  in  proportion  to  company  racial  populations.  Blacks  might  be  sampled 
at  a  higher  rate  in  order  to  get  a  sufficient  minority  sample  size.  When 
estimating  a  total  company  score,  ignoring  race,  the  individual  scores  within 
each  company  need  to  be  weighted  to  estimate  what  would  have  been  obtained 
without  disproportionate  sampling.  In  this  case  the  individual  scores  within 
each  company  are  weighted  according  to  the  following  formula: 


(HO) 


where,  W  represents  the  weight  for  black  subjects  in  company  i,  ND  and  N„ 
-i  “Bi  i 

represent,  respectively,  the  black  and  total  population  sizes  in  company  i,  and 
n0  and  nT  represent,  respectively,  the  black  and  total  survey  sample  sizes.  To 
— i  — i 

obtain  the  weight  for  white  subjects  in  company  i,  and  riy  representing, 

— i  — i 

respectively,  the  population  and  sample  sizes  of  whites  in  company  i  are 
substituted  to  replace  Nn  and  nQ  in  Formula  40. 

— D.  — B. 


A  second  reason  for  weighting  individual  scores  is  to  insure  that  the  units 
of  analysis  are  weighted  equally.  Since  each  unit,  as  a  data  point,  is  weighted 
equally  when  used  in  correlation  or  other  statistics,  each  unit  should  be 
weighted  equally  when  estimating  reliability.  Typically,  equal  sample  sizes  are 
obtained  from  groups  at  the  level  Intended  for  use  as  the  unit  of  analysis, 


26 


providing  equal  weights.  However,  weights  equal  at  this  level  will  not  be  equal 
at  another  level  when  hierarchical  levels  are  confounded.  Furthermore,  a  simple 
random  sample  may  have  been  used  which  will  produce  unequal  weights  when  group 
sizes  differ.  In  these  cases,  individual  scores  within  each  group  or  company  are 
weighted  as  follows: 


W. 


— 1 


— T 


n. 
— 1 


(41) 


whf»re  VT  is  the  weight  given  individual  responses  within  each  company,  and  n^ 
represent,  respectively ,  the  population  and  sample  size  for  company  i,  and  NT  and 
nT  represent,  respectively,  the  population  and  sample  totals  for  all  companies 
combined. 


A  third  reason  for  weighting  individual  scores ,  is  to  accurately  estimate 
the  error  terms  in  Table  2  when  subjects  are  considered  semirandom  (Formulas  4,  8 
and  10).  Each  unit  should  be  weighted  equally  in  terms  of  sample  size,  but  the 
company  population  sizes  are  unlikely  to  be  equal  also.  That  means  the  sampling 
term  (N  -  s)  /  N  found  in  Table  2  will  differ  from  company  to  company.  In  order 

to  accurately  estimate  the  error  terms  M 5^  and  M for  these  semirandom 

formulas,  individual  scores  within  each  company  should  be  weighted  as  follows: 

w.  .  *  2i>  7  V  (K) 

— l  —l  —l 


where,  W.  equals  the  weight  in  each  company  and  N  and  represent,  respective- 

ly,  the  population  and  sample  sizes  in  each  company.  MS^  and  MS„^,  obtained  from 

scores  weighted  by  (42)  are  substituted  in  Formulas  4,  8,  and  10  to  replace  the 
corresponding  terms  that  are  multiplied  by  (N  -  s)  /  N  .  The  other  means  square 

~~S  "  3 

terms  are  estimated  without  weighting. 

The  three  types  of  weighting  given  in  Formulas  (40),  (41),  and  (42)  may  be 
used  separately  or  together  in  any  combination  as  appropriate.  The  weights  given 
in  (40)  and  (41)  maintain  the  original  sample  sizes  as  required. 


Synchronization  Measures 


Making  the  Measures  Comparable 

Synchronization  measures,  are  shown  in  Table  7.  These  measures  are  used  for 
selecting  a  unit  of  analysis.  High  synchronization  for  a  unit  pinpoints  the 
level  of  the  organization  that  exercises  responsibility  and  control  over  the 
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Table  7 

Synchronization  Measures  for  Determining 
the  Unit  of  Analysis 


Unit  of  Analysis 


3 

Synchronization  Definition  Formula 


Number 


Companies 

(C) 


Battalions 

(B) 


Brigades 

(A) 


o2  +  +  a*)/rsq 


+  (£0*  +  Q*)/rsq 


a 


2 

A 


aA  + 


(43) 


-  (44) 

MSB  +  (a  -  1 )MSS 


—A  ”  — S 

-  (45) 

MSA  +  (be  -  1)MSS 


Subjects  are  considered  random  and  items  fixed.  Formulas  (44)  and  (45) 
differ  from  reliability  formulas  by  an  adjustment  which  makes  the  number  of 
subjects  within  Brigades  (  A)  and  Battalions  (B)  hypothetically  equal,  for 
purposes  of  comparison,  to  the  numbers  within  each  company  (C). 
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4 


4 


4 


subject  matter  represented  by  the  scale.  These  measures  provide  a  way  of 
directly  comparing  the  extent  of  synchronization  at  each  level  of  the  hierarchy, 
A,  B  and  C.  At  each  level  of  hierarchy  the  number  of  subjects  within  the  unit  of 
analysis  increases.  Increases  in  subjects  also  increases  reliability  as  measured 
by  Formula  3.  Reliability  as  measured  by  Formula  3  is  again  used  as  a 
synchronization  measure,  but  only  for  the  lowest  level  in  the  hierarchy — in  this 
case  for  Companies  (C).  The  synchronization  definitions  and  formulas  for  the 
higher  levels  of  hierarchy  B  and  A  are  adjusted  statistically  so  that  they  have 
the  same  number  of  subjects  within  groups  at  the  higher  levels  as  was  found  at  the 
lowest  level  C.  With  this  adjustment,  the  synchronization  measures  all  become 
directly  comparable.  If  a  comparison  of  Battalion  (  B)  and  Brigade  (  A) 

synchronization  is  desired  by  itself,  ignoring  Companies  (C),  the  s. nple  size 

adjustment  can  be  made  on  Brigades,  making  Brigades  equal  in  size  to  the  level 
just  below,  Battalions ,  as  follows: 

SQ  =  (MSb  -  MSS)  /  MSb  (46) 

SA  =  (MSa  -  MSS)  /  (MSA  +  (b  -  1)  MSS)  (47) 

where,  Sg  equals  synchronization  for  Battalions,  and  SA  synchronization  for 
Brigades. 


Significance  of  Difference  Between  Measures 

With  Formulas  (43)  to  (45) ,  the  degree  synchronization  can  be  compared 
directly  for  each  level  of  hierarchy,  to  determine  the  best  unit  of  analysis. 
Finally,  whether  synchronization  at  one  level  is  significatly  greater  than 
synchronization  at  another  can  be  tested  by  forming  appropriate  quasi-F  ratios  as 
shown  in  Table  8.  Each  of  the  synchronization  measures  shares  a  common  "error” 
term,  MS^,  which  is  ignored  when  comparing  relative  sizes  of  synchronization 

measures,  because  it  is  held  in  common.  Independent  mean  squares  are  needed  for 
F  ratios.  Comparing  synchronization  can  be  accomplished  by  comparing  the 
relative  sizes  of  the  "total"  variance  that  has  been  adjusted  for  equal  group 
sizes  ignoring  MS<,  for  the  reason  stated.  Company  synchronization  is  compared  to 

Battalion  and  Brigade  synchronization  in  Formulas  (48)  and  (49),  and  Battalion 
to  Brigade  in  (50).  For  the  latter  comparison,  Brigade  size  is  adjusted  to  equal 
Battalion  size  in  order  to  get  a  test  with  independent  mean  squares  in  the 
numerator  and  denominator  of  the  F  test.  Power  is  greater  for  the  test  in  Formula 
(50)  than  for  the  tests  in  (48)  and  (49). 

When  the  hierarchical  levels  A,  B  and  C  are  confounded,  individual  scores 
may  need  to  be  weighted  by  Formula  (41),  to  insure  that  each  unit  of  analysis  is 
weighted  equally.  The  weights,  when  needed,  will  change  as  confounded  hierarchi¬ 
cal  levels  change.  The  coefficients  c  and  be  in  Formulas  (44)  and  (45)  are 
arages  when  the  terms  A,  B,  and  C  are  confounded  and  weights  are  used.  When 
c efferent  weights  are  applied  at  different  hierarchical  levels  in  a  confounded 
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4 


Table  8 


Significance  of  Differences  Between  Synchronization  Measures 

£ 

Comparison  Test  Definition  F  Test  Number 

Companies  (C)  / 

Battalion  (B) 

Companies  (C)  / 

Brigade  (A) 

Battalion  (B)  / 

Brigade  (A) 

Note.  Formula  (48)  as  written  tests  whether  company  synchronization  is 

greater  than  battalion  synchronization.  The  numerator  and  denominator  can 

be  reversed  to  test  whether  battalion  synchronization  is  greatest. 

£ 

Degrees  of  freedom  for  quasi-F  tests  are  found  by  referring  to  Formula  (24) 
in  the  text. 
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design,  the  mean  squares  in  the  numerator  and  denominator  of  the  preceding  tests 
are  no  longer  independent,  so  that  testing  the  significance  of  the  difference 
between  synchronization  measures  in  this  case  should  be  used  with  caution. 


Removing  Synchronization 

When  synchronization  is  found  at  more  than  one  level  of  the  hierarchy,  the 
synchronization  at  the  higher  level  can  be  partialed  out  using  dummy  regression, 
if  desired.  The  existence  of  synchronization  at  each  level  can  be  tested  by 
applying  Formula  ( 1 6 )  at  each  level  of  hierarchy  to  see  if  significant  "true" 
variance  can  be  identified  at  each  level.  The  power  of  the  test  in  Formula  ( 1 6 ) 
is  higher  at  higher  levels.  The  number  of  degrees  of  freedom  remaining  after  a 
higher-level  group  is  partialed  out  may  be  reduced  sharply  as  a  result  of 
removing  synchronization.  Removing  synchronization  from  higher  levels,  however, 
would  leave  the  researcher  with  results  that  could  be  unambiguously  attributed  to 
the  lower-level  unit  and  its  leaders.  Depending  on  hypotheses,  this  might  be  a 
desirable  or  an  artificial  result.  It  is  possible,  however,  to  statistically 
eliminate  synchronization  from  higher  levels  when  desired. 


Computational  Reauirements 


There  are  two  primary  difficulties  in  computing  the  reliability  and  synchro¬ 
nization  measures  and  tests  given  in  this  paper.  The  most  serious  difficulty  is 
the  computer  core  space  required  to  compute  a  large  split-plot  analysis  of 
variance  design.  All  of  the  commonly  used  general  analysis  of  variance  packages, 
including  SAS,  RUMMAGE ,  BMP,  MULTIVARIANCE,  and  IMSL,  greatly  exceed  the  core 


limitations  of  virtually  all  computers,  for  even  modestly  sized  split-plot 
designs  that  involve  even  a  moderate  number  of  subjects.  As  the  number  of 
subjects  in  a  split-plot  design  increase,  factors  that  include  subjects  become 
huge.  Commonly  used  analysis  of  variance  packages  attempt  to  store  these  huge 
factors  in  core.  One  exception  is  BMDP2V  program,  which  does  not  require  an 
unreasonable  amount  of  core,  but  cannot  compute  the  hierarchical  portion  of  the 
design.  Only  one  level  in  a  hierarchy  is  possible.  A  general  analysis  of 
variance  program  capable  of  analyzing  any  design,  was  written  to  compute  reliabi¬ 
lities  for  aggregated  scores.  The  input  data  was  organized  by  sorting  to 
alleviate  the  cell  storage  problems.  Multiple  sorts  are  required  for  one  run  on 
a  given  model,  but  a  large  number  of  reliabilities  can  be  computed  during  a 
single  run. 

The  amount  of  computer  CPU  time  taken  to  compute  these  reliabilities  is  a 
second  problem.  Most  general  analysis  of  variance  packages  create  dummy  vari¬ 
ables  to  calculate  either  balanced  or  unbalanced  designs,  but  in  split-plot 
designs  the  number  of  dummy  variables  required  is  often  huge,  requiring  large 
amounts  of  computer  time.  The  general  analysis  of  variance  program  that  was 
written  for  computing  reliabilities,  uses  the  balanced  algorithm  given  previous¬ 
ly.  The  balanced  algorithm  is  appropriate  for  unbalanced  data  when  confounded 
components  in  an  unbalanced  design  are  allocated  between  true  and  error  variance, 
as  outlined  previously.  The  algorithm  was  modified  slightly  in  order  to  make  the 
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algebra  appropriate  in  the  unbalanced  as  well  as  the  balanced  case.  Looking  back 
at  the  steps  required  to  get  sums  of  squares,  step  (c)  follows  immediately  after 
step  (a)  when  applied  to  the  unbalanced  case.  Degrees  of  freedom  are  obtained  by 
getting  the  sum  of  the  cells  associated  with  main  effects  that  are  listed, 
instead  of  the  product  of  the  levels  of  the  main  effects  listed,  as  given  for  the 
balanced  case  (see  p.  4).  The  balanced  algorithm  in  this  program  computes 
reliabilities  much  more  rapidly  than  do  programs  that  generate  dummy  variables. 
Multiple  sorts  on  input  data  do,  however,  take  some  1-0  ("wall  clock")  time,  but 
this  is  required  to  alleviate  the  more  serious  core  storage  problems. ^ 


Summary 


When  research  is  conducted  with  intact  organizations,  groups  rather  than 
individuals  are  used  frequently  as  the  unit  of  analysis.  One  advantage  of  using 
groups  as  units  is  that,  in  this  case,  interaction  within  these  groups  can  be 
studied.  If  groups  are  selected  as  the  unit  of  analysis,  what  level  of  the 
organizational  hierarchy  should  be  selected  for  study?  A  statistical  technique 
is  suggested  for  selecting  groups  at  the  most  appropriate  level  of  the  organiza¬ 
tional  hierarchy,  at  a  level  that  actually  controls  and  is  responsible  for  the 
subject  matter.  This  technique  measures  the  extent  of  synchronization  within 
groups  at  different  levels  of  the  hierarchy.  The  level  selected  for  the  unit 
should  generally  be  the  level  with  greatest  synchronization. 

After  selecting  an  appropriate  group  unit  of  analysis,  how  should  reliabili¬ 
ty  be  estimated?  Survey  variables  consist  of  scores  aggregated  over  both 
subjects  within  groups  and  survey  items.  The  traditional  methods  of  estimating 
reliability  are  either  incomplete  or  inappropriate  when  applied  to  estimating  the 
reliability  of  these  aggregated  scores.  Using  analysis  of  variance,  appropriate 
reliability  formulas  were  derived  that  depend  on  both  the  unit  of  analysis  and 
survey  sampling  plan.  In  addition,  significance  tests  for  these  reliabilities 
were  given,  as  well  as  formulas  to  determine  sample-size  requirements  from 
pretest  data.  A  technique  for  estimating  the  reliability  of  record  data,  in  the 
form  of  frequency  counts  within  groups,  is  also  given.  Together,  these  statisti¬ 
cal  techniques  provide  improved  methods  for  studying  the  operation  of  organiza¬ 
tions. 


2 

Information  about  the  availability  of  this  computer  "•"gram  may  be  obtained  bj 
writing  the  authors  at  Army  Research  Institute  Fien.  Unit,  P.0.  Box  5787( 
Presidio  of  Monterey,  CA  93940.  The  program  has  been  written  so  that  it  is  easj 
to  use  with  simple  model  input  statements.  Implementation  on  different  computers 
could  pose  problems,  depending  on  the  extent  to  which  the  program  is  giver 
continued  attention  and  development  by  the  authors. 
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